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Abstract. We describe an approach to bounded-memory computation of per- 
sistent homology and betti barcodes, in which a computational state is main- 
tained with updates introducing new edges to the underlying neighbourhood 
graph and percolating the resulting changes into the simplex stream feeding 
the persistence algorithm. 

We further discuss the memory consumption and resulting speed and com- 
plexity behaviours of the resulting algorithm. 



1. Introduction 

Persistent homology has turned out to be a powerful tool in data analysis as well 
as several other fields. When used in application areas, the computation of Betti 
barcodes is usually done in an exploratory fashion, with no good a priori intuition 
for what parameters to use for the computation. 

Among the publicly available software: both Dionysus, jPlex, and javaPlex 
all produce a computational interface in which a particular run is computed in 
its entirety, and the internal state subsequently discarded. This is a problem for 
exploratory computations, as restarting a computation has a high cost, and little, 
if any, information can be salvaged should the computational parameters chosen 
exceed the capabilities of the platform. 

Here, we propose a solution to these issues: explicitly saving the internal state 
of the computation, and an incremental computational paradigm that generates 
simplices of the Vietoris-Rips complex in the order they appear in the simplex 
stream, and computes persistent homology interleaved with the generation of new 
simplices. 

There are bounds on how memory efficient an implementation of Vietoris-Rips 
and persistent homology can be: since the persistence algorithm pairs simplices, it 
will be difficult to store any less than each simplex once. However, we can discard 
the Vietoris-Rips simplex stream as it is consumed by the persistence algorithm, 
and grow not much beyond the data needed to store the initial stream. This 
is a significant difference from any existing implementation, in which the entire 
stream is generated, and subsequently the entire persistence computation structure 
is generated as well. 



2. Incremental Vietoris-Rips construction 

We propose here a construction of the Vietoris-Rips simplex stream by an in- 
cremental process: assuming we can acquire a stream of edges in the Vietoris-Rips 
neighbourhood graph, we consume one edge at a time, and generate all simplices 
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implied by any given edge just before they are consumed by the persistence algo- 
rithm. 

At the core of this approach is the observation that we can maintain a current 
list of maximal cliques, and any new maximal clique after adding an edge comes 
from the intersection of maximal cliques containing either of the two endpoints of 
the edge. Any maximal clique equal to a such intersection joined with one of the 
endpoints will vanish in the update, and any other maximal cliques stays — as do 
any maximal among the cliques constructed as such adjoined intersections. 

Proposition 2.1. Suppose C is a system of maximal cliques in a graph V. Create 
the graph T U e adjoined with the new edge e — (s,t). Write C v for the maximal 
cliques in T containing the vertex v. 

If l s £ £s and £ t G £t, then (£ s n It) U {s, t} is a clique in T U e. 

Maximal cliques in T U e are cliques in £ or on the shape (£ s fl It) U {s,t} that 
are not contained in any other cliques in this list. 

Proof. There are a few statements to be proven here. 

(1) (£ s fl £ t ) U {s, t} is a clique. Indeed, any vertex in £ s f~1 £ t is djacent to both 
s and t. Furthermore, these vertices are all adjacent to each other. Hence, 
(£ s n £ t ) U {s, t} is a clique in T U e. 

(2) Any maximal clique in T U e is either maximal in T or has the shape (£ s fl 
It) U {s, t}. Indeed, suppose £ is a maximal clique in T U e. 

If e £ , then £ is a clique in T. Suppose £ is not maximal in T. Then 
there is some larger clique £' containing £. However, £' is still a clique in 
r U e, which contradicts the maximality of £ in T U e. 

Suppose now that e £ £. Then £ \ {s, t} is a clique in T. We claim 
that £ \ s (and £\t: the argument is symmetric) are contained in maximal 
cliques in T such that the intersection of these maximal cliques is exactly 
£ \ {s,t}. It is immediately clear that £ \ s is contained in at least one 
maximal clique. Indeed, since £\s avoids e, all its edges are present in T, 
and thus £ \ s is a clique in T. Similarily, £ \ t is contained in at least one 
maximal clique. Remains to show that £ \ {s, t} is the intersection of some 
pair of such maximal cliques. Suppose there was no such pair that had 
this as their precise intersection. Then, all intersections between maximal 
cliques containing either are larger than £. Pick some pair of maximal 
cliques £ s D £\t, £ t D £\s. Then (£ s n£ t )Ll{s,t} is a clique in TUe by the 
same reasoning as above. Furthermore, (£ s fl £ t ) U {s, t} strictly contains £. 
This contradicts the maximality of £ and thus finishes the proof. 

□ 

Suppose now we have a set of maximal cliques C for T and a set of maximal 
cliques £' for T U e. Then the set of simplices introduced by the introduction of e 
corresponds to the set of cliques in lUe that were not cliques in T. Obviously, any 
cliques in £ containing e will be such new simplices, but so will also any subcliques 
of these cliques that contain e. 

Indeed, any clique containing e was obviously not present in Y. Conversely, any 
clique introduced by the addition of e has to contain e. 
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Example 2.2. Consider the graph: 
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This graph has maximal cliques abcf, abcg, def and deg. Adding the edge f — g 
yields the new cliques abcfg from the intersection abcf CI abcg, fg from the in- 
tersections abcf n deg and def (~l abcg and the clique def g from the intersection 
def Pi deg. We may recognize fg as redundant, since it is contained in (several of) 
the other candidate cliques. All previous maximal cliques are contained in one of 
the remaining maximal cliques abcfg and defg. 

Furthermore, we may consider faces of abcfg and defg. As soon as either 
f or g are removed from a clique, we find a clique of T. Thus, the new faces 
are the boolean algebras formed by pointwise union of the powerset of {a, b, c} 
or {d, e} respectively with the set {f,g}. Thus, in this case, the new faces are 
f9, afg, bfg, cfg, dfg, efg, abfg, acfg, bcfg, defg, abcfg. 

We note that this implies an efficient algorithm for generating new simplices: for 
each clique created by adjoining the new edge to a clique intersection, we generate 
all subsets of the intersection clique, and add the edge to each thus generated subset. 
Finally, we remove duplicates from the union of all such simplices. 

max.cliques = [[v] for v in vertices] 
def processEdge ( source , target): 

s.cliques = [el for cl in max_cliques if source in cl] 
t.cliques = [ cl for cl in max_cliques if target in cl] 
intersections = [ intersection ( els , clt ) 

for els in s_cliques 
for clt in t.cliques 
max_intersections = [ cl in intersections 

if cl is maximal in intersections 
new.cliques = [ union (cl, [source, target]) 

for cl in max_interscctions ] 
old.cliques = [ cl for cl in max.cliques 

if 1 < min [ size ( difference ( cl , ncl ) ) 
for ncl in new.cliques 
simplex.seeds = unique ( [ subsets (cl) 

for cl in max_intersections ]) 
new_simplices = [ union(cl, [source, target]) 

for cl in simplex_seeds 
return unique ( new.simplices ) 
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3. Interleaved algorithm 

The persistence algorithm, as described in Edelsbrunner et al., 2000| and sub- 
sequently refined in |Zo morodian and Carlss on, 2005] , consumes a stream of sim- 
plices, and maintains a state from which at any time both the finished intervals 
as well as the half-open intervals still unfinished can be read. We propose to run 
the persistence algorithm interleaved with the incremental Vietoris-Rips algorithm 
above, thus discarding any information not needed for the subsequent computation, 
and staying continuously with a state that can be easily interpreted at any point 
along the computation as well as used for continued computation. 



4. Space usage 

The current best practice for Vietoris-Rips complex construction was described 
in |Zom orodia n, 2010] . In this approach, first a filtered graph is generated. Then, 
from this graph, we generate a Vietoris-Rips complex, and then from this complex, 
a persistence barcode with the table in the persistence algorithm. 

A full distance graph will take 0(n 2 ) space: at least three values (source index, 
target index and distance) needs to be stored for each of the Q) edges available on 
n vertices. 

A full Vietoris-Rips complex, in turn, will consume exponential space in the 
number of vertices. However, most analyses work with a fc-skeleton for some small 
k, and indeed, this research was originally motivated by attempts to compute degree 
2 homology, and thus working with the 3-skeleton instead of the more widespread 
2-skeleton. This has a memory consumption of 0(n k ), since 53r=i CO tuples need 
to be stored, and each such r-tuple requires r indices. 

The homology computation, finally, erects a table with marked simplices and 
their cascades, as described in |Zomorodian and Carlsson, 2005] . This table stores 
each simplex that creates a homology class, and for each such simplex, some number 
of cascade simplices. It is not known, currently, what the behaviour of the cascade 
is in the algorithm: what the expected distribution of cascade storage for a generic 
barcode computation would be. However, since the representatives of any homology 
class can be found in the cascades of paired marked simplices, the cascade storage 
sizes correspond to the storage sizes of generic representatives for homology classes. 

Finally, the barcode produced by the persistence algorithm stores at least a start 
and end time for each interval, creating a storage consumption of 0(n k ). 

All in all, we are faced with a memory consumption of 0(n k ) for the entire 
computation. 

There are some ways we can try to accomodate the storage needs of a compu- 
tation. One of them is to use laziness in our computations, and only generate the 
things we are about to consume. The 0(n k ) storage needs for the Vietoris-Rips 
complex can be broken up into pieces that are far smaller, in fact of a space complex- 
ity dependent on the number of new cofaces of any given edge rather than on the 
entire simplicial complex. We still face 0{n k ) in storage needs for the persistence 
algorithm table, but the constant is significantly smaller in this latter case. 

Another way to approach this problem is to use the stratified nature of modern 
computers. We have many different kinds of memory: LI and L2 caches, RAM 
memory, swapped out RAM pages, explicit file storage, networked storage. By 
writing out anything not explicitly needed for future computation to disk, we can 
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relieve the load on RAM (currently ranging in sizes from 2G to 64G in most hard- 
ware available to individual researchers) and instead put this load on hard drives 
(currently ranging in sizes from 250G to multiple tera-byte in corresponding hard- 
ware). 

Candidates for relegation to slower storage in the pipeline we describe here are: 

(1) The distance graph. We can generate all 0(n?) pairwise distances, and 
write them to disk as we find them. These can then be sorted, if necessary 
in-place on disk, to yield a file that stores, say, three values for each of the 
(2) distances in question, namely distance, index to the source vertex and 
index to the target vertex. 

Once the sorted distance graph is stored, it is a simple disk read of a 
finite and small amount of data to gain access to the next edge in the 
Victoris-Rips graph. 

(2) The persistence intervals. Once a persistence interval is closed, the infor- 
mation about the actual interval is not needed for the further computation. 
The corresponding marked simplex and cascade are needed, but the tuple 
of birth-time, death-time and optionally the associated homology cycle are 
not needed for further work. 

We can not escape the 0(n k ) storage requirements. However, by minimizing 
memory residence for any of the irrelevant steps - either by using an incremen- 
tal algorithm, or by disk residence - will stretch the reach of current persistent 
homology work. 

These ideas: on-disk storage of the generating graphs and of the resulting in- 
tervals, as well as minimal memory residence for the Vietoris-Rips complex before 
inclusion in the persistence tables, are in no way isolated to this particular im- 
plementation; we expect that all existing implementations will benefit from these 
ideas. 
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